Mixtures of Rectangles: Interpretable Soft Clustering

نویسندگان

  • Dan Pelleg
  • Andrew W. Moore
چکیده

To be eeective, data-mining has to conclude with a succinct description of the data. To this end, we explore a clustering technique that nds dense regions in data. By constraining our model in a speciic way, we are able to represent the interesting regions as an intersection of intervals. This has the advantage of being easily read and understood by humans. Speciically, we t the data to a mixture model in which each component is a hyper-rectangle in M-dimensional space. Hyper-rectangles may overlap, meaning some points can have soft membership of several components. Each component is simply described by, for each attribute, lower and upper bounds of points in the cluster. The computational problem of nding a locally maximum-likelihood collection of k rectangles is made practical by allowing the rectangles to have soft \tails" in the early stages of an EM-like optimization scheme. Our method requires no user-supplied parameters except for the desired number of clusters. These advantages make it highly attractive for \turn-key" data-mining application. We demonstrate the usefulness of the method in subspace clustering for synthetic data, and in real-life datasets. We also show its eeective-ness in a classiication setting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hyper-rectangle-based Discriminative Data Generalization and Applications in Data Mining

The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-ba...

متن کامل

Demixing and orientational ordering in mixtures of rectangular particles.

Using scaled-particle theory for binary mixtures of two-dimensional hard particles with orientational degrees of freedom, we analyze the stability of phases with orientational order and the demixing phase behavior of a variety of mixtures. Our study is focused on cases where at least one of the components consists of hard rectangles, or a particular case of these, hard squares. A pure fluid of ...

متن کامل

Effective classification of 3D image data using partitioning methods

We propose partitioning-based methods to facilitate the classification of 3-D binary image data sets of regions of interest (ROIs) with highly non-uniform distributions. The first method is based on recursive dynamic partitioning of a 3-D volume into a number of 3-D hyper-rectangles. For each hyper-rectangle, we consider, as a potential attribute, the number of voxels (volume elements) that bel...

متن کامل

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001